The folders are organized in the following manner:

-- data/ contains both intermediate files that are generated by code
(in data/tmp_files) and results (in data/results). This folder
contains files created in a prior execution, so that any of the steps
can be replicated directly, without having to necessarily follow the
order below. When the code below is run, it will overwrite files from
the prior execution.

-- table_1/ contains code and input data for calculating the topic -
word lists for Members of Congress, Governors and Mayors, and should
produce output qualitatively similar to that from which Table 1 was
extracted (as an example).

-- figure_1/ contains code and input data for running the bootstrap
analyses, calculating topic distances, and aggregating results into
a form where they can produce data for Figure 1. The code also
produces a version of that figure, albeit formatted somewhat
differently.

-- cs_score/ contains code and some data needed to calculate the
Congressional Similarity (CS) scores, which are written to a csv file.

It is assumed that the code in figure_1/ is executed before the code
in cs_score/

A few general notes:
1. The input data has been preprocessed into serialized Python objects
("pickle" files) which are in the folders. The preprocessing was
standard, involving lemmatization, stopword removal, etc.

2. Many of the scripts are computationally intensive and take several
hours (at a mimimum) to run. We minimally recommend an 8-core machine
with 16 GB of RAM.

3. While the code will run (and overwrite existing files as needed)
without any of the files in data/ some of it does rely on the
directory structure within data/ so that should be maintained. 

More detail is provided in the readme files within each 
folder. 

The development environment was: Python 3.7.0, numpy 1.15.1, scipy
1.1.0, gensim 3.6.0. However, this code has also been tested with
Python 3.8.3, numpy 1.18.5, scipy 1.5.0, gensim 4.0.1




